Run CI on Modal, upgrade Bitsandbytes #641

mryab · 2025-02-10T01:37:27Z

This PR switches the execution of tests from GitHub Actions to Modal, unlocking the option to use GPUs in those tests in the future. Since Modal workers can have multiple CPUs, we can also run tests in parallel, which speeds them up by approximately 4x — from 8 minutes to under 2 minutes.

Also, as multiple tests seem to be unstable, this PR also disables them for the time being or marks them as flaky with pytest.mark.xfail. This will be fixed in future PRs.

Lastly, since the current version of Bitsandbytes is outdated, the PR upgrades it

codecov · 2025-02-23T23:34:14Z

Codecov Report

Attention: Patch coverage is 74.07407% with 7 lines in your changes missing coverage. Please review.

Project coverage is 85.20%. Comparing base (d20e810) to head (cfa51d2).
Report is 24 commits behind head on master.

Files with missing lines	Patch %	Lines
hivemind/moe/client/moe.py	50.00%	6 Missing ⚠️
hivemind/moe/server/runtime.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #641      +/-   ##
==========================================
- Coverage   85.39%   85.20%   -0.20%     
==========================================
  Files          81       81              
  Lines        8006     8049      +43     
==========================================
+ Hits         6837     6858      +21     
- Misses       1169     1191      +22

Files with missing lines	Coverage Δ
hivemind/compression/base.py	`94.36% <100.00%> (ø)`
hivemind/compression/quantization.py	`94.53% <100.00%> (+0.31%)`	⬆️
hivemind/moe/server/connection_handler.py	`90.72% <100.00%> (-0.95%)`	⬇️
hivemind/moe/server/runtime.py	`22.40% <0.00%> (-53.21%)`	⬇️
hivemind/moe/client/moe.py	`68.13% <50.00%> (-24.65%)`	⬇️

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mryab · 2025-03-15T09:41:08Z

hivemind/moe/client/moe.py

+        samples_with_tasks = {sample_idx for sample_idx, _ in task_to_indices.values()}
+        pending_samples = len(samples_with_tasks)  # samples for which we have less than k_min results
+        assert pending_samples <= num_samples


It's better to recompute the number of pending_samples, as some samples might have 0 tasks (for example, in test_call_many)

justheuristic

Appreciate the version updates and a face lift to the tests. LGTM; pending @dvmazur

* Run CI on Modal, upgrade Bitsandbytes * Extract the blocksize for quantization into a constant (cherry picked from commit 767afa5)

mryab added 30 commits February 10, 2025 01:37

Run CI on Modal, upgrade Bitsandbytes

fc52696

Add docs configuration

58f3d44

Fix formatting

6d36cd1

Configure concurrency for Modal tests

ab714bd

Sort imports

c840ab9

Set up the timeout

f717bf6

Set up concurrency for other actions as well

0dca5a2

Remove concurrency limits

11feccf

Add concurrency, update bitsandbytes in dependencies

cbf4450

Add cache, bump CI versions

4f303bd

Skip test_allreduce_protocol for the time being

6a5ec5e

Reduce the number of CPUs

ba3e386

Decrease the limits in test_dht_connection_successful

1fb8dec

Restore the limits in test_dht_connection_successful

67e040f

Clear the blacklist before attempting store

c0af379

Increase the wait in test_load_state_from_peers

6116570

Parametrize tests by Python version, upload Codecov coverage

801bb4f

Check out and build a specific version of bitsandbytes

fd69b64

Increase the timeouts to account for image builds

22739f5

Introduce timeouts

635879f

Increase the number of CPUs for tests

8fbd9dd

Make tests more robust

d70b4b9

Make tests more robust

4254468

Reformat the code

1753bae

Mark test_client_disconnect as flaky

4753fef

Build and test p2pd separately

9705318

Install Go only for a specific image

ae5ed98

Don't use uv when building p2pd

11eb277

Mark test_dhtnode_blacklist as flaky

9d37fe9

Increase timeouts

7abc9f0

mryab added 12 commits February 23, 2025 19:49

Pass extra environment variables to codecov

66c9187

Remove --dist from codecov run

93460aa

Pass GITHUB_EVENT_PULL_REQUEST_HEAD_SHA when running the test

75529a1

Mark test_fault_tolerance as flaky

83b53bb

Mark test_cli_run_server_identity_path as flaky

5984bad

Disable parallel execution for codecov management

2f67c52

Increase codecov run timeout to 15 minutes

e8efb66

Pass GITHUB_EVENT_PULL_REQUEST_HEAD_SHA to the workflow

f8ad2a8

Pass additional secrets

225439e

Mark one more test as flaky

3695813

Mark another test as flaky

6bac780

Pass codecov values explicitly

3228dfd

mryab added 3 commits February 23, 2025 23:36

Pass --no-use-pep517 to uv pip install

0a9347d

Change uv pip to pip

87f0ece

Extract the blocksize for quantization into a constant

46fa9f5

mryab commented Mar 15, 2025

View reviewed changes

mryab added 3 commits March 15, 2025 09:41

Fix missing newline

717dd34

Rewrite test_averaging_trigger with time.monotonic

0fcd2ba

Replace os.unlink with os.remove

cfa51d2

justheuristic requested review from dvmazur and justheuristic March 15, 2025 10:16

dvmazur requested review from justheuristic and removed request for justheuristic March 15, 2025 10:16

dvmazur approved these changes Mar 15, 2025

View reviewed changes

justheuristic approved these changes Mar 15, 2025

View reviewed changes

mryab merged commit 767afa5 into master Mar 15, 2025
14 checks passed

mryab deleted the modal-ci branch March 15, 2025 10:25

mryab added a commit that referenced this pull request Apr 20, 2025

Run CI on Modal, upgrade Bitsandbytes (#641)

dfd40fd

* Run CI on Modal, upgrade Bitsandbytes * Extract the blocksize for quantization into a constant (cherry picked from commit 767afa5)

mryab mentioned this pull request Apr 20, 2025

[BUG] hivemind.compression is not compatible with bitsandbytes == 0.39.1 #572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run CI on Modal, upgrade Bitsandbytes #641

Run CI on Modal, upgrade Bitsandbytes #641

Uh oh!

mryab commented Feb 10, 2025 •

edited

Loading

Uh oh!

codecov bot commented Feb 23, 2025 •

edited

Loading

Uh oh!

mryab Mar 15, 2025

Uh oh!

justheuristic left a comment

Uh oh!

Uh oh!

Uh oh!

Run CI on Modal, upgrade Bitsandbytes #641

Run CI on Modal, upgrade Bitsandbytes #641

Uh oh!

Conversation

mryab commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mryab Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

justheuristic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mryab commented Feb 10, 2025 •

edited

Loading

codecov bot commented Feb 23, 2025 •

edited

Loading